NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery

Mall, Utkarsh; Phoo, Cheng Perng; Chiquier, Mia; Hariharan, Bharath; Bala, Kavita; Vondrick, Carl (June 2025, CVPR)

Free, publicly-accessible full text available June 10, 2026
Scale-aware Recognition in Satellite Images under Resource Constraints

Revankar, Shreelekha; Phoo, Cheng Perng; Mall, Utkarsh; Hariharan, Bharath; Bala, Kavita (April 2025, ICLR)

Free, publicly-accessible full text available April 24, 2026
Learning 3D Perception from Others' Predictions

Yoo, Jinsu; Feng, Zhenyang; Pan, Tai-Yu; Sun, Yihong; Phoo, Cheng Perng; Chen, Xiangyu; Campbell, Mark; Weinberger, Kilian Q; Hariharan, Bharath; Chao, Wei-Lun (April 2025, International Conference on Learning Representations)

Accurate 3D object detection in real-world environments requires a huge amount of annotated data with high quality. Acquiring such data is tedious and expensive, and often needs repeated effort when a new sensor is adopted or when the detector is deployed in a new environment. We investigate a new scenario to construct 3D object detectors: learning from the predictions of a nearby unit that is equipped with an accurate detector. For example, when a self-driving car enters a new area, it may learn from other traffic participants whose detectors have been optimized for that area. This setting is label-efficient, sensor-agnostic, and communication-efficient: nearby units only need to share the predictions with the ego agent (e.g., car). Naively using the received predictions as ground-truths to train the detector for the ego car, however, leads to inferior performance. We systematically study the problem and identify viewpoint mismatches and mislocalization (due to synchronization and GPS errors) as the main causes, which unavoidably result in false positives, false negatives, and inaccurate pseudo labels. We propose a distance-based curriculum, first learning from closer units with similar viewpoints and subsequently improving the quality of other units' predictions via self-training. We further demonstrate that an effective pseudo label refinement module can be trained with a handful of annotated data, largely reducing the data quantity necessary to train an object detector. We validate our approach on the recently released real-world collaborative driving dataset, using reference cars' predictions as pseudo labels for the ego car. Extensive experiments including several scenarios (e.g., different sensors, detectors, and domains) demonstrate the effectiveness of our approach toward label-efficient learning of 3D perception from other units' predictions.
more » « less
Free, publicly-accessible full text available April 28, 2026
AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery

Zhou, Hangyu; Kao, Chia-Hsiang; Phoo, Cheng Perng; Mall, Utkarsh; Hariharan, Bharath; Bala, Kavita (December 2024, NeurIPS)

Full Text Available
AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery

Zhou, Hangyu; Kao, Chia-Hsiang; Phoo, Cheng Perng; Mall, Utkarsh; Hariharan, Bharath; Bala, Kavita (December 2024, NeurIPS 2024)

Clouds in satellite imagery pose a significant challenge for downstream applica- tions. A major challenge in current cloud removal research is the absence of a comprehensive benchmark and a sufficiently large and diverse training dataset. To address this problem, we introduce the largest public dataset — AllClear for cloud removal, featuring 23,742 globally distributed regions of interest (ROIs) with diverse land-use patterns, comprising 4 million images in total. Each ROI includes complete temporal captures from the year 2022, with (1) multi-spectral optical im- agery from Sentinel-2 and Landsat 8/9, (2) synthetic aperture radar (SAR) imagery from Sentinel-1, and (3) auxiliary remote sensing products such as cloud masks and land cover maps. We validate the effectiveness of our dataset by benchmarking performance, demonstrating the scaling law — the PSNR rises from 28.47 to 33.87 with 30× more data, and conducting ablation studies on the temporal length and the importance of individual modalities. This dataset aims to provide comprehensive coverage of the Earth’s surface and promote better cloud removal results.
more » « less
Full Text Available
AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery

Zhou, Hangyu; Kao, Chia-Hsiang; Phoo, Cheng Perng; Mall, Utkarsh; Hariharan, Bharath; Bala, Kavita (December 2024, NeurIPS)

Full Text Available
Learning to Detect Mobile Objects from LiDAR Scans Without Labels

You, Y; Luo, Katie Z; Phoo, Cheng P; Chao, W; Sun, W; Hariharan, B; Campbell, M; Weinberger, Kilian Q (June 2024, Conference on Computer Vision and Pattern Recognition, June 2022)

Full Text Available
Better Monocular 3D Detectors with LiDAR from the Past

https://doi.org/10.1109/ICRA57147.2024.10610444

You, Yurong; Phoo, Cheng Perng; Andres_Diaz-Ruiz, Carlos; Luo, Katie Z; Chao, Wei-Lun; Campbell, Mark; Hariharan, Bharath; Weinberger, Kilian Q (May 2024, IEEE)

Full Text Available
Emergent Correspondence from Image Diffusion

Tang, Luming; Jia, Menglin; Wang, Qianqian; Phoo, Cheng; Hariharan, Bharath (December 2023, Advances in neural information processing systems)

Finding correspondences between images is a fundamental problem in computer vision. In this paper, we show that correspondence emerges in image diffusion models without any explicit supervision. We propose a simple strategy to extract this implicit knowledge out of diffusion networks as image features, namely DIffusion FeaTures (DIFT), and use them to establish correspondences between real images. Without any additional fine-tuning or supervision on the task-specific data or annotations, DIFT is able to outperform both weakly-supervised methods and competitive off-the-shelf features in identifying semantic, geometric, and temporal correspondences. Particularly for semantic correspondence, DIFT from Stable Diffusion is able to outperform DINO and OpenCLIP by 19 and 14 accuracy points respectively on the challenging SPair-71k benchmark. It even outperforms the state-of-the-art supervised methods on 9 out of 18 categories while remaining on par for the overall performance. Project page: https://diffusionfeatures. github.io.
more » « less
Full Text Available
Distilling from Similar Tasks for Transfer Learning on a Budget

Borup, Kenneth; Phoo, Cheng; Hariharan, Bharath (September 2023, IEEE International Conference of Computer Vision)

We address the challenge of getting efficient yet accurate recognition systems with limited labels. While recognition models improve with model size and amount of data, many specialized applications of computer vision have severe resource constraints both during training and inference. Transfer learning is an effective solution for training with few labels, however often at the expense of a compu- tationally costly fine-tuning of large base models. We propose to mitigate this unpleasant trade-off between compute and accuracy via semi-supervised cross- domain distillation from a set of diverse source models. Initially, we show how to use task similarity metrics to select a single suitable source model to distill from, and that a good selection process is imperative for good downstream performance of a target model. We dub this approach DISTILLNEAREST. Though effective, DISTILLNEAREST assumes a single source model matches the target task, which is not always the case. To alleviate this, we propose a weighted multi-source distilla- tion method to distill multiple source models trained on different domains weighted by their relevance for the target task into a single efficient model (named DISTILL- WEIGHTED). Our methods need no access to source data, and merely need features and pseudo-labels of the source models. When the goal is accurate recognition under computational constraints, both DISTILLNEAREST and DISTILLWEIGHTED approaches outperform both transfer learning from strong ImageNet initializations as well as state-of-the-art semi-supervised techniques such as FixMatch. Averaged over 8 diverse target tasks our multi-source method outperforms the baselines by 5.6%-points and 4.5%-points, respectively.
more » « less
Full Text Available

« Prev Next »

Search for: All records